1 
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than 1 - - 
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Abstract 

Consider any discrete memoryless channel (DMC) with arbitrarily but finite input and output alphabets X, y 
respectively. Then, for any capacity achieving input distribution all symbols occur less frequently than 1 — i. That 
is, 

maxP'fi) < 1 

xex e 

where P* (x) is a capacity achieving input distribution. Also, we provide sufficient conditions for which a discrete 
distribution can be a capacity achieving input distribution for some DMC channel. Lastly, we show that there is no 
similar restriction on the capacity achieving output distribution. 

Index Terms 

Channel Capacity, Discrete Memoryless Channels (DMC). 

I. Introduction 

For an arbitrary discrete probability distribution, under what circumstances can we find a discrete memoryless 
channel (DMC) for which the given distribution achieves the capacity of that channel? Is it always possible to find 
a channel for any arbitrary distribution? As surprising as it might seem, the answer is negative. That is, there exist 
probability distributions that can never be capacity achieving distributions for any discrete memoryless channel. 
More precisely, the main result of this work is that a source distribution that transmits a symbol with probability 
greater than or equal to 1 — - can never be a capacity achieving distribution. 

The result stated above, leads to the following natural question. Is there a similar restriction on the capacity 
achieving output distribution of a discrete memoryless channel? Using a dual characterization of the channel capacity 
we are going to argue that all probability distributions can be capacity achieving output distributions for some 
channel. 

Last but not least, we asked whether there exist simple sufficient conditions on whether an arbitrary probability 
distribution is a capacity achieving distribution for some channel. Consider an input probability distribution P(x). 
If there is a subset of symbols whose sum of probabilities lies in the (^, 1 — ^) interval, then there exists a channel 
for which the distribution P(x) is capacity achieving. 
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The paper is organized as follows. In section |TT| we review some important definitions and we introduce our 



notation. In section III we present the starting point of this work and in section IV we extend the latter result to the 
case of discrete memory less channels with two inputs and multiple outputs. Then, in section [V] we present the most 
general result for an arbitrary discrete memoryless channel with multiple inputs and multiple outputs and we state 



a dual information geometric result. Lastly, in section VI we present sufficient conditions for an input distribution 
to be capacity achieving for a DMC. 

II. Preliminaries 

We require the following definitions Q. 

Definition A discrete channel, denoted by (X, P(y\x), y), consists of two finite sets X and y and a collection of 
probability mass functions P(y\x), one for each x G X, such that for every x and y, P(y\x) > 0, and for every x, 

E P(v\x) = !■ 
yey 

A channel is memoryless if 

a 

P(y n \x n ) = l[P(yi\xi) (i) 

i=l 

for all n > 1. 

Definition We define the information channel capacity of a discrete memoryless channel (X , P(y\x),y) as 

C = m&xI{X;Y) (2) 

P(x) 

where I(X; Y) denotes the mutual information and the maximum is taken over all possible input distributions 
P(x). 

Denote 

P* (x) = arg max I(X; Y) (3) 

P(x) 

1*1 

i=l 

the capacity achieving input distribution and the capacity achieving output distribution respectively. 

Note that there may exist more than one capacity achieving input distribution for a given channel, but they all 
induce the same capacity achieving output distribution. Also, we do not consider trivial channels with capacity zero. 

The capacity of a discrete memoryless channel can be calculated using the following dual representation due to 
Gallager and Ryabko |2). Consider a DMC (X,P(y\x),y). The capacity is given by 

C? = minmax£>(P(-|a;)||Q(y)). (5) 

Q x£X 

This means that the capacity of a channel can be interpreted as the radius of the smallest information ball containing 
the rows of P(-\x), x g X. Then the minimizing Q is the center of this ball and Q is the capacity achieving output 
distribution. 
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III. Binary Input, Binary Output 

For the binary-input binary-output channel it is possible to obtain a simple analytical solution for the capacity 
achieving distribution in terms of the transition probabilities JT). Using these formulas, it turns out that for a binary 
channel, neither symbol should be transmitted with probability greater than or equal to 1 — - in order to achieve 
capacity. Note that there is no DMC for which the distribution Pq(x) 



ifx = 

Po(x) ={" (6) 
1 - \ if x = 1 

is capacity achieving. However, there exists the Z-channel of Fig. [2] with capacity achieving distribution 



1 + e(S) if x = 
P{x) = { e (7) 
\-\- e(S) if x = 1 

that has capacity C — C(8), where C(S) — > and e(S) — > as 5 —> 0. Therefore, Pq(x) is never an optimal input 
distribution for a non-trivial discrete memoryless channel. 



o • 




Fig. 2: Z-Channel with capacity approaching zero. 



Next, we generalize this result for the case of a DMC with \y\ > 2. 

IV. Binary Input, Multiple Output 

In this section, we will prove that increasing the output alphabet size for a binary input does not change our 
restriction on the input symbol probabilities needing to belong in the interval 1 — i). 
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Lemma 1. The following function 



}{a;pi,p 2 ) =pi log — — | -^- _p 2 log- 



api + ap 2 

where a — 1 — a and pi,p 2 ,a £ [0, 1] satisfies 



api + ap 2 



(Pi -P2) 



/(-;pi,p 2 )>o 

e 



(8) 



(9) 



with equality iff pi — p 2 or p 2 = 0. 
Proof. See Appendix. 



Pi(y) 



Theorem 1. Let (X , P(y\x),y) be a two input discrete memoryless channel with transition matrix 

where Pi(j) = P(y = j\x = i) corresponds to the conditional probability of receiving symbol j given that symbol 
i is transmitted. We assume that \y\ > 2. Let a* := arg max I(X]Y), where a — Pr{X = 1}. 

0<a<l 

Then 



i<a*<l-i. 

e e 



(10) 



0.6 




Fig. 3: f{\;pi,P2) >0for Pl) p 2 e [0,1]- 



Proof. The capacity C of the DMC is given by: 

C = m&xI(X;Y) (11) 

P(x) 

= maxI(X;Y) (12) 

where the maximization is taken over all input distributions P(x) = Pr{X = x},x £ {0, 1}, a = Pr{X = 1} and 
a = 1 — a. 

Note that since I(X; Y) is a concave function of a 

dI (f' Y) =D(P 1 \\Q a )-D(P 2 \\Q a ) (13) 
da 



is a non-increasing function of a. It suffices then to prove that 



dI(X;Y) 



da 



> and 



dI(X;Y) 



da 



< 



because in that case by the intermediate value theorem the solution a* of d/ ^' y - ) = satisfies 

a* e (-,!--)• 
e e 

To this end, we have that: 



(14) 



I(X;Y) = D(P(x)P(y)\\P(x,y)) 



\y\ 



££p(*)PMiog(«) 



\y\ 

£[aPx (y) log(— 



Note that 



dI(X;Y) 
da 



where Q a {y) = aP\{y) + aP 2 (y) and that 



L»(Px||Q a )-£)(P 2 ||Q Q ) 



l>'l 



D(Pi 1 1 Q a ) - D (P 2 1 1 Q a ) = J2 f ("J ^i (») > p 2 (»)) 

2/=l 

where / is defined in Eq. [8] Notice that by Lemma [T] 

X;/(-;^i(v).^(v))>o 

with equality iff P\(y) = Pz{y), V y € y. When P\(y) = P2(y), V y G y, C = is a trivial case that we ignore. 
Therefore, 

\y\ 



(15) 
(16) 

(17) 
(18) 
(19) 

(20) 



^2f( 1 ;P 1 (y),P 2 (y))>0 



^ J D(P 1 ||Qi)- J D(P 2 ||Qi)>0 
By interchanging Pi and P 2 above we get that 

D(P 1 \\Q 1 _ l )-D(P 2 \\Q 1 _ k )<0 



(21) 



(22) 



From Eq. (21 1 and (22i it follows that 

dI(X;Y) 



da 



> and 



dI(X;Y) 



da 



< 0. 



Therefore by the intermediate value theorem the solution a* of rf/ ^' Y - > = satisfies 

a* e (-,1--). 

e e 



(23) 



This completes the proof. 



□ 
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Fig. 4: Geometric interpretation of Corollary 



Corollary 1. Let P\{y) and P 2 (y) be any two probability distributions on y. Let a € [0, 1] be chosen such that 
D{Pi\\Q a ) = D(P 2 \\Q a ) where Q a (y) = aP\{y) + (1 - a)P 2 (y). Then a G - \), where e is the base of 
the natural logarithm. 

In Fig. |4]we show a geometric representation of Corollary [T] For any two distributions Pi and P%, the distribution 
Q a that satisfies D{P\\\Q a ) — D(P2\\Q a ) lies in the interval as shown in the figure. 

Corollary 2. Let P\{y) and P 2 {y) be any two probability distributions on y (Pi ^ P 2 ). Let Qi(y) = \P\{y) + 
(l-i)P 2 (y). Then, 

D(P 1 ||Qi)- J D(P 2 ||Qi)>0. (24) 

Corollary |2] has an interesting interpretation in an information geometric sense. Specifically, consider any two 
distributions Pi and P 2 . Then the distribution Q a lies on the line segment that connects the distributions Pi and 
P 2 . The farthest that Q a can be from P 2 and still always be closer to P 2 than to Pi is for a=\. 

Corollary 3. Consider a cost constraint l(x) on the input symbols such that 1(0) = and 1(1) = 1. Let a = 
Pr(x = 1). Define the following problem: 

C„ = max I(X: Y) 

a£[0,l] 

subject to EZ(x) < 

If P < \ then a* = p. 

As an example of the application of Corollary [3] consider a binary DMC where the fraction of l's is constrained to 
be at most say 20%. According to Theorem 1 the fraction of l's is at least - (around 36%). Then from Corollary 
3 we conclude that the capacity achieving fraction of l's is exactly 20%. 
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V. Maximum Symbol Probability for a Multiple Input, Multiple Output DMC 

We now expand the above result to show that in fact, for any capacity achieving input distribution of a discrete 
memoryless channel, no input symbol can ever have a probability greater than or equal to 1 — -. 

Theorem 2. Let any discrete memoryless channel (DMC) (X, P(y\x), y) with input alphabet X := {0, 1, 2, ... m — 
1} and output alphabet y := {0, 1, 2, ... n — 1}. Let P(x) be the input distribution over the alphabet X. 



Define P*(x) := arg max7(X; Y). Then 

P(x) 



max P*(x)<l-- (26) 

0<a;<m — 1 e 



Proof. To show that the 1 — i result extends beyond the binary input case, without loss of generality we need to 
prove that Pr{X = 0} < 1 — i, Define a function f(x) on our input 



f{x) = l{x + 0} = ^ (27) 



Denote Pr{/(a;) = 0} = a. Also, denote a = 1 — a. 
The capacity C of the channel is given by 

C = max7(X;F) 




(^) 



max I(X,f(X);Y) 
max 

max [I(f(X);Y) + aI(X;Y\f(X) = 1)] 

P(x./(x)) 

max + a/(X;y|/(X) = 1)] 

P(s|/(x)) P(f(x)) 

= max max[7(/(X) ; y)+a7(X;y|/(X)=l)] (28) 

P(x|/(x)) a 

where 

• (a) follows from the fact that f(X) is a deterministic function of X. 

. (6) follows from the fact that when f(X) = 0, H(X\f(X) = 0) = and thus I(X; Y\f(X) = 0) = 0. 

• (c) follows because the choice of P(f(x)) can be made independent of P(x\f(x)), so we can split the 
maximization. 

Let Oi\ = arg max I(f(X);Y). Fix P(x\f{x)). Then, from Theoremjlj 

a x < 1 - 1 (29) 

e 

and we also know that 

Hf(X);Y)\ a=ai > I(f(X)-Y)\ a>ai (30) 
aI(X; Y\f(X) = l)| a=aj > S7(X; - l)| a>ai (31) 
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The first inequality follows from the definition of ot\ and the second inequality follows easily because 
aI(X;Y\f(X) = 1) is a linear function of a. 
Adding inequalities ( |30| > and ( f3T| we get that: 

[I(f(X);Y) + aI(X; Y\f(X) = l)]| a=ai > [I(f(X);Y) + aI(X; Y\f(X) = l)]| tt>ai 

From inequality (|32| we conclude that 

axgmax[I(f(X);Y)+aI(X;Y\f(X) = 1)] < a x 

a 

W argma X [/(/(X);F)+a/(X;y|/(X) = 1)] < 1 - - 
q e 

Note that the latter holds for any P(x\f(x)). 

Thus, 



(32) 

(33) 
(34) 



a = Pr{X = 0} < 1 



This completes the proof. 



(35) 

□ 



In Fig. [5] we show the region of optimal input distributions P*(x) (dark region) on the 3-simplex that satisfy the 



constraint max P(x) < 1 — -. 

0<X<TO-1 e 




Fig. 5: Space of optimal input distributions for a 3-3 DMC. 



The above result induces the following constraints for the capacity achieving output distribution. 

Corollary 4. Let P\{y) , ^(y) j P^iu) > • ■ • > Pm{y) be any m distributions on y. Let Q*(y) be the center of the 
information ball of minimum radius that contains the distributions Pi,P<2,... , P m on the y -simplex. 
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That is, let Q*(y) be the solution of the optimization problem 

min C 

Q(y) 



subject to D(Et\\Q) <C, l<i<m 

(36) 

Q(y) = £ a^y) 



J2 a i = 1 



Then a, < 1 — -, 1 < i < 



Proof. By the min-max Duality Theorem for capacity (|2|), C is the capacity of the DMC whose rows are the 
distributions Pi and a,; is the probability of the input symbol i. Applying Theorem [2] we get that: 



a>i < 1 - -. □ 

e 



Fig. [6] shows a sketch of the region in which the capacity achieving output distribution Q*{y) lies. Pi(y),P 2 (y) 
and P?,(y) are the rows of the channel matrix. 




Fig. 6: Space of optimal output distributions for a DMC with rows Pi(y), Pz{y) and Ps{y)- 



Remark. The capacity achieving output distribution Q* (y) can lie anywhere on the 3^-simplex. However, given the 
channel matrix, Q* (y) lies on the region described by Corollary [4] and shown in Fig. [6] 

VI. Sufficient Conditions for Optimality 

So far we have shown that for any discrete memoryless channel with finite input alphabet X and finite output 
alphabet y, the capacity achieving distribution P* (x) must satisfy P* (x) < 1 — - Vx G X. In this section we show 
that if there exists a subset of P(x) that sums in ( = , 1 — =), then there exists a channel for which this distribution 
is a capacity achieving input distribution. 
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Theorem 3. Let P{x) be a discrete input probability distribution over a discrete memoryless channel (X , P(y\x),y) 
with X := {0, 1, 2 . . . , m — 1}. If 3 S C X : ^ P{x) G (-, 1 — -) then there exists a channel for which P(x) is 
a capacity achieving input distribution. 



Proof. We are going to construct a channel for which the distribution P{x) is a capacity achieving input distribution. 
Let 



(37) 



x£S 



Then, p£ ( A , 1 — =) . Also, denote q = 1 — p. 

We now construct a 2-Input, 2-Output channel with transition matrix 

P(y\x = 0) 



Pi{y\x) 



P(y\x = l) 



(38) 



such that the distribution 



Q{x) 



(39) 



p if x = 
q if x = 1 

is capacity achieving. The latter always exist because for a 2-Input 2-Output DMC, the formulas for the capacity 
achieving input distribution (12) are continuous over the transition probabilities and p E (A,l— A). The latter 
allows us to conclude that any input distribution which satisfies the 1 — A constraint has a channel for which that 
distribution is capacity achieving. 

Starting from the transition matrix Pi(y\x), construct a channel in which the second line is cloned m — 2 times: 



P(y\x 
P(y\x 
P(y\x 



= 0) 
= 1) 
= 1) 



P(y\x = l) 



(40) 



Obviously, for this channel any distribution that has 



P(x = 0)=p 



m—1 



E p ( x = *) = 



(41) 
(42) 



is a capacity achieving input distribution since all the lines of the matrix except the first are exactly the same. Note 



that the probability distribution P(x) satisfies equations (41 1 and (42 1 and therefore it is a capacity achieving input 
distribution for the channel with transition matrix P2(y\x). This completes the proof. □ 
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VIII. Appendix 

Proof of Lemma [TJ 

We must prove that f{-;Pi,P2) > 0. To this end, we compute the first and second derivatives of /(-;pi,P2) over 
Pi: 

dpi \P1 + {1-\)P2 \P1 + {1-\)P2 

9 2 f{- e ;PuP2) = P2 ((e- l) 2 p 2 -Pi) 
d Pl Pi (Pi + (e - 1) P2 ) 2 

Note that when p\ = P2, g^- = and > 0. Therefore, for a fixed p 2 = c, p\ = c is a local minimum of the 
function f(-;Pi,p 2 ). Further, c) is convex for pi < (e — l) 2 c and concave for pi > (e — l) 2 c. Therefore, 

the global minimum of the function f(-;pi, c) with respect to p\ should occur either at p\ = 1 or at p\ = c. Note 
also that /(i; c, c) = 0. Also, one can easily verify using calculus that /(i; 1, c) > 0, Vc £ [0, 1]. Thus, it follows 
that f{\;pi,P2) > 0. 

Lastly, it is easy to verify that the conditions of equality are pi = p 2 and p 2 = 0. 



